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Abstract —In many applications, the observations can be rep¬ 
resented as a signal defined over the vertices of a graph. The 
analysis of such signals requires the extension of standard signal 
processing tools. In this work, first, we provide a class of 
graph signals that are maximally concentrated on the graph 
domain and on its dual. Then, building on this framework, we 
derive an uncertainty principle for graph signals and illustrate 
the conditions for the recovery of band-limited signals from 
a subset of samples. We show an interesting fink between 
uncertainty principle and sampling and propose alternative signal 
recovery algorithms, including a generalization to frame-based 
reconstruction methods. After showing that the performance 
of signal recovery algorithms is significantly affected by the 
location of samples, we suggest and compare a few alternative 
sampling strategies. Finally, we provide the conditions for perfect 
recovery of a useful signal corrupted by sparse noise, showing 
that this problem is also intrinsically related to vertex-frequency 
localization properties. 

Index Terms —signals on graphs, uncertainty principle, sam¬ 
pling, sparse noise, frames 

I. Introduction 

I N many applications, from sensor to social networks, 
transportation systems, gene regulatory networks or big 
data, the signals of interest are defined over the vertices of 
a graph 0 > 0 - Over the last few years, a series of papers 
produced a significant advancement in the development of 
tools for the analysis of signals defined over a graph, or graph 
signals for short 0 , 0 - One of the unique features 

in graph signal processing is that the analysis tools come 
to depend on the graph topology. This paves the way to a 
plethora of methods, each emphasizing different aspects of the 
problem. A central role is played by spectral analysis of graph 
signals, which passes through the introduction of the Graph 
Fourier Transform (GFT). Alternative definitions of GFT have 
been proposed, see, e.g., 0-0, looking at the problem from 
different perspectives: 0, apply to undirected graphs 

and build on the spectral clustering properties of the Laplacian 
eigenvectors and the minimization of the G-norm graph total 
variation; 0,0 define a GFT for directed graphs, building on 
the interpretation of the adjacency operator as the graph shift 
operator, which lies at the heart of all linear shift-invariant 
filtering methods for graph signals (8)|, [9 ] [] Building on G3- 
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one could also introduce a GFT for directed graphs based on 
the eigendecomposition of the modified Laplacian for directed 
graphs introduced in | f2] . 

After the introduction of the GFT, an uncertainty principle 
for graph signals was derived in 1131 and, more recently, in 
d), (H), 00- The aim of these works was to establish a 
link between the spread of a signal on the vertices of the 
graph and the spread of its spectrum, as defined by the GFT, 
on the dual domain. The further fundamental contribution 
was the formulation of a sampling theory aimed at finding 
the conditions for recovering a graph signal from a subset 
of samples: A seminal contribution was given in [;5jj, later 
extended in 03’ ID and, very recently, in 0, o?f, 00. 
In the following, after introducing the notation, we briefly 
recall the background of graph signal processing and then we 
highlight the specific contributions of this paper. 

A. Notation and Background 

We consider a graph Q = (V,£) consisting of a set of N 
nodes V = {1,2, ...,7V}, along with a set of weighted edges 
£ = {ctij}i,jeV’ such that > 0, if there is a link from 
node j to node i, or = 0, otherwise. The symbol <S 
denotes the cardinality of set S, i.e., the number of elements 
of S. The adjacency matrix A of a graph is the collection of 
all the weights ctij,i,j = 1,... ,7V. The degree of node i is 
ki := LjLi a i.i■ The degree matrix is a diagonal matrix having 
the node degrees on its diagonal: K = diag {fci, k’ 2 ,..., Uv}. 
The combinatorial Laplacian matrix is defined as L = K — A. 
In the literature, it is also common to use the normalized graph 
Laplacian matrix C = K TK 'U a signal x over a graph 
Q is defined as a mapping from the vertex set to the set of 
complex numbers, i.e. x : V —>■ C. We denote by || • || the 
G-norm of a signal, i.e. ||a;|| 2 = l a 'i| 2 - We recall now 

the basic background for better clarifying the contributions of 
our work. 

Let us introduce the eigen-decomposition of the Laplacian 
matrix 

N 

L = UEU* = Cl) 

i-1 

where H is a diagonal matrix with non-negative real eigen¬ 
values {£*} on its diagonal and i = 1,... ,7V, are the 

real-valued orthonormal eigenvectors; the symbol (•)* denotes 
conjugate transpose. The Graph Fourier Transform x of a 
signal x defined over an undirected graph has been defined 
in 0, 0, 0, 0, as 


x = U*:c 


( 2 ) 
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where U is the unitary matrix whose columns are the Lapla- 
cian eigenvectors. One of the motivations for projecting the 
signal x onto the subspace spanned by the eigenvectors of 
L, as in 0, is that these eigenvectors encode some of the 
graph topological features. For example, they are known for 
exhibiting spectral clustering properties |21J, (22). Hence, the 
GFT defined in ([2} is useful for emphasizing clustered signal 
components, i.e. signals that are smooth within a cluster, but 
are allowed to vary arbitrarily across different clusters. In this 
work, we assume the GFT to be defined as in (J2), where U 
is only required to be a unitary matrix. In most numerical 
examples we assume U to be composed by the eigenvectors 
of the Laplacian matrix, but all derivations are not restricted to 
that choice. This means that all theoretical findings are valid 
for any mapping from primal to dual domain described by 
a unitary operator U. Given a subset of vertices S C V, we 
define a vertex-limiting operator as a diagonal matrix D 5 such 
that 

D 5 = Diag{l 5 }, (3) 

where I 5 is the set indicator vector, whose v'-th entry is equal 
to one, if i £ S, and zero otherwise. Similarly, given the 
unitary matrix U used in Q, and a subset of indices T C 
V*, where V* = {1,..., N} denotes the set of all frequency 
indices, we introduce the operator 

Bjr = USjf-U*, (4) 

where Sjr is a diagonal matrix defined as Sjr = Diagjljr}. 
The role of B jr is to project a vector x onto the subspace 
spanned by the columns of U whose indices belong to 
T. It is immediate to check that both matrices D 5 and 
Bjr are self-adjoint and idempotent, so that they represent 
orthogonal projectors. In the sequel, Vg denotes the set 
of all S- vertex-limited signals, i.e. satisfying D 5 x = x, 
whereas Bjr denotes the set of all ^-band-limited signals, i.e. 
satisfying Bjr x = x. In the rest of the paper, whenever there 
will be no ambiguities in the specification of the sets, we will 
drop the subscripts referring to the sets, to avoid overcrowded 
symbols. Given a set S, we denote its complement set as 
S , such that V = S U S and S n S = 0. Correspondingly, 
we define the vertex-projector onto S as D. Similarly, the 
projector onto the complement set T is denoted by B. 


1) Uncertainty principle: A fundamental property of 
continuous-time signals is the Heisenberg uncertainty princi¬ 
ple, stating that there is a basic trade-off between the spread 
of a signal in time and and the spread of its spectmm in 
frequency. In particular, a continuous-time signal cannot be 
perfectly localized in both time and frequency domains (see, 
e.g., (23) for a survey on the uncertainty principle). More 
specifically, given a continuous-time signal x(t) and its Fourier 
transform X(f), introducing the time spread 


2 _ /^l( i -^) 2 k(f )| 2 df 


(5) 


/-!^o * |s(*)I 2 dt 
fZo \ x (*) I 2 dt 


and the frequency spread 

/_°° 00 (/-/o) 2 |X (/)| 2 df 


A 2 - 
A / - 


with 


/o = 


fZo \ X (f)\' 2 d / 
!Zof\ X (f) I 2 d/ 


( 6 ) 


fZ |A '(/)| 2 df ’ 

the uncertainty principle states that 

A?Af > 


/ - 


(4-7t) 2 


After the introduction of the GFT, an uncertainty principle for 
signals defined over undirected connected graphs was derived 
in 113]. In particular, denoting by d(u, v) the geodesic distance 
between nodes u and v, i.e. the length of the shortest path 
connecting u and v, the spread of a vector x in the vertex 
domain was defined in m as 

1 — 0 (7) 


A 2 := 


mm -— 

n 0 GV \\x 


— T* P ^ T 
12 ^ * u o'*' 


where P „ 0 := diag(d(M 0 , u), d(u 0 , t> 2 ),..., d(n 0 , v N )). Sim¬ 
ilarly, the spread in the GFT domain was defined as 

1 


A 2 := 


£6 


( 8 ) 


where was defined in 0 - The two definitions of spread 
in the graph and its dual domain given in ([7]) and ([ 8 ]) are the 
graph counterparts of formulas ([5| and ([ 6 } for continuous-time 
signals. In fl3) , it was studied the tradeoff between the signal 
spread on the graph and on its spectral (dual) domain, i.e. 
between 0 . for a given value of uq, i.e. without performing 
the minimization operation, and (H). 

2) Sampling: One of the basic issues in graph signal 
processing is sampling, whose goal is to find the conditions 
for recovering a band-limited (or approximately band-limited) 
graph signal from a subset of values and to devise suitable 
sampling and recovery strategies. More specifically, a band- 
limited graph signal can be represented as 


x = Us, 


(9) 


where U is an appropriate basis and s is sparse. Typically, U 
coincides with the matrix whose columns are the eigenvectors 
of L. If we denote by S C V the sampling subset, the sampled 
signal can be represented as 

x s = D s tc = D 5 Us, (10) 

where U 5 is defined as in (|3j. The problem of recovering a 
band-limited signal from its samples is then equivalent to the 
problem of solving system ( [T()| , by exploiting the sparsity of s. 
This problem was addressed, for example, in (5), 1 1 7| , Jl9) , 
and 17]. Alternative recovery strategies have been proposed, 
either iterative (24 j, (19) , or not (7). In (5), Jl9| , frame-based 
recovery algorithms have also been proposed. 

A key important remark is that the sampling strategy, i.e. 
the identification of the sampling set S, plays a key role in 
the performance of the recovery algorithms, as it affects the 
conditioning of system (jTOj). It is then particularly important to 
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devise strategies to optimize the selection of the sampling set. 
This problem is conceptually similar to the problem known 
in the literature as experimental design, see, e.g., (25)- (27). 
Sampling strategies for graph signals were proposed in [ |l8| 
and, more recently, in |7J. 


B. Contributions 

The main contribution of this paper is to present a holistic 
framework that unifies uncertainty principle and sampling, 
building on the identification of the class of graph signals that 
are maximally concentrated over the graph and dual domain. 
The specific contributions are listed below. 

1) Uncertainty principle: The definitions of spread in the 
graph and dual domain given in 0 and dS}, as suggested in 
[|1_3 |, are reminiscent of the formulas <[5ji and 0 valid for 
continuous-time signals. They are both based on second order 
moments of the signal distribution over the graph domain and 
on its dual. However, when dealing with graph signals, there 
is an important distinction to be pointed out with respect to 
time signals: While time (or frequency) is a metric space, with 
a well defined notion of distance, the graph domain is not 
a metric space. The vertices of a graph may represent, for 
example, molecules and the signal may be the concentration 
of a molecule in a given mixture. In cases like this, it is not 
obvious how to define a distance between vertices. Since in 
0 the definition of distance enters directly in the computation 
of the spread, it turns out that the uncertainty principle comes 
to depend on the specific definition of distance over the graph. 
The definition of distance given in (T3) makes perfect sense, 
but as pointed out by the authors themselves, it is not the only 
possible choice. When dealing with graphs, other definitions 
of distance have been proposed in the literature, including 
the resistance distance (28) and the diffusion distance (29). 
An open question arises, for example, in the presence of 
multiple shortest paths having the same distance between two 
vertices. In such a case, using the definition 0, the presence 
of multiple paths having the same distance does not affect the 
computation of the spread. However, the presence of multiple 
paths might indicate an easier way for the information to flow 
through the network. In fact, using the definition of resistance 
distance suggested in (28), the distance between two nodes 
comes to depend on the number of shortest paths with the 
same distance connecting them. To avoid all the shortcomings 
associated with the definition of distance over a graph, in this 
paper we use an alternative definition of spread and derive 
an uncertainty principle that does not require any additional 
definition of distance. More specifically, we take inspiration 
from the seminal works of Slepian, Landau and Pollack (30) , 
(31), on prolate spheroidal wave-functions. In those works, 
the effective duration T of a continuous-time signal centered 
around a time instant to was defined as the value such that the 
percentage of energy falling in the interval [to~T/2,t o+T/2] 
assumes a specified value a 2 , i.e. 

H*)i 2 * 2 

SZc K*)l 2 * “ ' 


Similarly, the effective bandwidth W is the value such that 

rfo+W/2 


ISIw/2 W)l 2 d/ 
f-oo l*(/)l 2 d/ 


= p ■ 


We transpose these formulas into the graph domain as follows. 
Given a vertex set S and a frequency set T, using 0 and 0, 
the vectors Da: and Ba: denote, respectively, the projection of 
x onto the vertex set S and onto the frequency set T. Then, we 
denote by a 2 and B 2 the percentage of energy falling within 
the sets S and T , respectively, as 


\Dx\\% _ 2 

ii 11 o QL , 


\\Bx 


= p 


(id 


In this paper, we find the region of all admissible pairs (a, /?), 
by generalizing HD to the discrete case. More specifically, 
we express the boundaries of the admissible region in closed 
form and illustrate which are the signals that attain all the 
points of the admissible region. It is worth noticing that, in 


(111, the graph topology is captured by the matrix U, present 
in the definition of the GFT in (0 which appears inside the 
operator B. The theory presented in this paper is valid for any 
unitary mapping from some discrete space to its dual. 

2) Sampling: Building on the construction of a basis of 
maximally concentrated signals in the graph/dual domain, we 
express the conditions for recovering a band-limited signal 
from a subset of its values in terms of the properties of 
this basis. These conditions are equivalent to the conditions 
derived in 0 , (T7), (7), and ( 15 ). The novelty here is that our 
formulation shows a direct link between sampling theory and 
uncertainty principle. It is shown that the unique recovery of 
any signal from B, requires that there should be no nontrivial 
signal from B that is perfectly localized on S, i.e. one needs 
Bjr n T)g to be empty. There may be various choices of S 
satisfying this requirement, but each choice may significantly 
affect the stability of the recovery algorithm, so that selecting 
the sampling set S is a crucial step. Building on this idea, 
we propose several signal recovery algorithms and sampling 
strategies aimed to find an optimal sampling set S. In addition, 
we propose a frame-based reconstruction method that fits 
perfectly into the given sampling framework, as it relies on 
the properties of the projectors B and D. 

Finally, we compare our algorithms with the methods pro¬ 
posed in 0, 03 and with the benchmark resulting from 
the solution of a combinatorial problem (only for small size 
networks, where the combinatorial search is still manage¬ 
able). The comparison is carried out over a class of random 
graphs, namely the scale-free graphs, which are known for 

and 


modeling many real world situations, see, e.g. 1321, 
our techniques exhibit advantages in terms of Mean Square 
Error (MSE) and show performance very close to the optimal 
theoretical bound. We also show an example of selection of the 
sampling set for a real network, namely the IEEE 118 Bus Test 
Case, representing a portion of the American Electric Power 
System. 

3) Signal recovery in case of strong impulsive noise: 
Motivated by a sensor network scenario where some sensors 
may be damaged, we show under what conditions the recovery 
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of a band-limited signal can be unaffected by some sort of 
impulsive noise affecting a subset of nodes, using £i-norm 
minimization. Interestingly, we show that also this problem is 
inherently associated to the localization properties of projec¬ 
tors onto the graph and its dual domain. The rest of the paper is 
organized as follows. In Section [II] we derive the localization 
properties of graph signals, illustrating as a particular case 
the conditions enabling perfect localization in both vertex and 


frequency domains. Building on these tools, in Section III 


we derive an uncertainty principle for graph signals and, in 
Section IV we derive the necessary and sufficient conditions 
for recovering band-limited graph signal from its samples 
and propose alternative recovery algorithms. In Section [Y[ we 
analyze the effect of observation noise on signal recovery 
and, finally, in Section VI we propose and compare several 
sampling strategies. 


II. Localization properties 

Scope of this section is to derive the class of signals that are 
maximally concentrated over given subsets S and T in vertex 
and frequency domains. We say that a vector x is perfectly 
localized over the subset S C V if 

Y) x = x, (12) 

with D defined as in <0 Similarly, a vector x is perfectly 
localized over the frequency set T C V* if 


which implies that x is perfectly localized in the frequency 
domain. Now, using 0 and the Rayleigh-Ritz theorem, we 
can also write 

s*BDBj; x*Gx 

1 = max-= max-. (18) 

rtn^ rtn rtn^ ry 

tl/ i*' iL/ tt/ 

This shows that x satisfies also 0 . i-e-, x is also perfectly 
localized in the vertex domain. ■ 

Equivalently, the perfect localization properties can be ex¬ 
pressed in terms of the operators BD and DB. First of all, 
we prove the following lemma. 

Lemma 2.2: The operators BD and DB have the same 
singular values, i.e. cr, (BD) = cr.; (DB), i = 1,..., N. 

Proof: Since both matrices B and D are Hermitian, 
(BD)* = DB. But the singular values of a matrix coincide 
with the singular values of its Hermitian conjugate. ■ 

Combining Lemma 2.2 and ( [T4| , perfect localization onto the 
sets S and T can be achieved if and only if 


BDllo = IIDBIIo = 1. 


(19) 


As mentioned in Theorem 2.1 the vectors perfectly localized 


in both vertex and frequency domains must belong to the 
intersection set B (I'D, which is non-empty when the sum of 
dimensions of B and T> is greater than the dimension of the 
ambient space of dimension N. Hence, a sufficient condition 
for the existence of perfectly localized vectors in both vertex 
and frequency domains is 


|<S| + |^| > N. 


( 20 ) 


B x = x, (13) 

with B given in (|4|. Differently from continuous-time signals, 
a graph signal can be perfectly localized in both vertex and 
frequency domains. This is stated in the following theorem. 

Theorem 2.1: There exists a non trivial vector x, perfectly 
localized over both vertex set S and frequency set T (i.e. 
x € B^H'Ds) if and only if the operator BDB (or DBD) has 
an eigenvalue equal to one; in such a case, x is an eigenvector 
associated to the unit eigenvalue. 

Proof: Let us start proving that, if a vector x is perfectly 
localized in both vertex and frequency domains, then it must 
be an eigenvector of BDB associated to a unit eigenvalue. 
Indeed, by repeated applications of ( [T2| and ( | 1 3| , it follows 

BDBtc = BDs = Ba; = x. (14) 


This proves the first part. Now, let us prove that, if x is an 
eigenvector of BDB associated to a unit eigenvalue, then x 
must satisfy (12» and (p~3]>. Indeed, starting from 


BDBx = x 


(15) 


and multiplying from the left side by B, taking into account 
that B 2 = B, we get 


BDBs = Bcc (16) 

Equating ( | 1 5[ ) to 0 . we get 


Conversely, if |<S| + J 7 < N, there could still exist perfectly 
localized vectors, when condition ( [T9| is satisfied. 

Typically, given two generic domains S and J 7 , we may have 
signals that are not perfectly concentrated in both domains. In 
such a case, it is worth finding the class of signals with limited 
support in one domain and maximally concentrated on the 
dual one. For example, we may search for the orthonormal 
set of perfectly band-limited signals, i.e. satisfying Ba; = x , 
which are maximally concentrated in a vertex domain S. The 
set of such vectors {t/^} is constructed as the solution of the 
following iterative optimization problem, for i = 1,... ,N: 

vl>i = argmax |[Di/’i [| 2 

s-t- IIV’zlU = 1, (21) 

= ipi, 

= 0 , j = if i > 1 . 

In particular, if 1 is the band-limited signal with the highest 
energy concentration on S\ xjj 2 is the band-limited signal, 
orthogonal to ■0 1 , which is maximally concentrated on S, and 
so on. The vectors {t/^} are the counterpart of the prolate 
spheroidal wave functions introduced by Slepian and Pollack 
for continuous-time signals m ■ The solution of the above 
optimization problem is given by the following theorem. 

Theorem 2.3: The set of orthonormal ^-band-limited vec¬ 
tors with K := rankB, maximally concentrated 

over a vertex set S, is given by the eigenvectors of the BDB 
operator, i.e. 


Bcc = x, 


(17) 


BDBi/i, = Xiipi, 


( 22 ) 
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with Ai > A2 > ... > A k- Furthermore, these vectors are 
orthogonal over the set S , i.e. 


(^,D^) = A jSij, (23) 


where Sij is the Kronecker symbol. 

Proof: Substituting the band-limiting constraint within the 
objective function in 0- we get 


ifti = argmax ||DB-0^|| 2 

*i (24) 

s.t. 11^112 = 1, (Ai'l’j) = 0, j^i- 


Using Rayleigh-Ritz theorem, the solutions of ( |24| are the 
eigenvectors of (DB)* DB = BDB, i.e. the solutions of (221. 
This proves the first part of the theorem. The second part is 
proven by noting that, using Bi/^ = i/) i and B* = B, we 


obtain (^,BDB^.) = = XjSij. ■ 

The above theorem provides a set of perfectly ^-band-limited 
vectors that are maximally concentrated over a vertex domain. 
The same procedure can of course be applied to identify the 
class of orthonormal vectors perfectly localized in the graph 
domain and maximally concentrated in the frequency domain, 
simply exchanging the role of B and D, and thus referring to 
the eigenvectors of DBD. 


III. Uncertainty principle 

Quite recently, the uncertainty principle was extended to 
signals on graphs in GD by following an approach based 
on the transposition of the definitions of time and frequency 
spreads given by 0 and ([6j) to graph signals, as indicated 
in 0 and ®- However, as mentioned in the Introduction, 
the computation of spreads based on second order moments 
implies a definition of distance over a graph. Although the 
definition of distance used in ED- based on the shortest 
path between two vertices, is perfectly reasonable, there are 
alternative definitions of distance over a graph, such as the 
resistance distance |28| or the diffusion distance |29j. To 
remove any potential ambiguity associated to the definition 
of distance over a graph, taking inspiration by the seminal 
works of Slepian, Landau and Pollack (30), (5T), in this paper 
we resort to a definition of spread in the graph and frequency 
domain that does not imply any definition of distance. More 
specifically, given a pair of vertex set S and frequency set 
IF, denoting by a 2 and ft 2 the percentage of energy falling 
within the sets S and T, respectively, as defined in (IT) , our 
goal is to establish the trade-off between a and ft and find out 
the signals able to attain all admissible pairs. The resulting 
uncertainty principle is stated in the following theorem. 

Theorem 3.1: There exists a vector x such that ||a;|| 2 = 1, 


l|D*|| 2 : 

= a , 

l|Ba;|| 2 = 

B if and only if (a, 

ft) e r. 

where 

r = {(<*,£) 







cos -1 

cy. + 

COS 

IV 

COS &max 

(BD), 



cos -1 

3/r 

— a 2 

+ cos 

; _1 /3 > cos" 

& max 

(BD), 

(25) 

cos -1 

a + 

cos - 

Vi 

— /3 2 > cos" 

&max 

(BD), 


cos -1 

a/i 

— a 2 

+ cos 1 \/l — fi‘ 

2 > cos" 

&max 

(BD)}. 



Fig. 1: Admissible region F of unit norm signals x with || Da; ||2 = a 
and ||Ba;||2 = ft. 


Proof: The proof is reported in Appendix A. ■ 

An illustrative example of admissible region T is reported 
in Fig. |T] A few remarks about the border of the region F 
are of interest. First of all, if we take the equality signs in 
the inequalities given in (25] >, we get the equations describing 
the curves appearing at the four corners sketched in Fig. 
[T] namely upper right, upper left, bottom right and bottom 
left, respectively. The upper right corner of L, in particular, 
specifies the pairs (ct, ft) that yield the maximum concentration 
over both graph and dual domains. This curve has equation 


cos 


-1 


ct + cos 1 B = cos V m „(BD). 


(26) 


Solving (261 with respect to ft, and setting a; 
CT m aa: ( BD )> we g et 

fi = a (Jmax + \/(l - a 2 )(l - <7 max)- 


2 

max 


(27) 


Typically, for any given subset of nodes S, as the cardinality 
of T increases, this upper curve gets closer and closer to the 
upper right corner. The curve collapses onto a point, namely 
the upper right corner, when the sets S and T give rise 
to projectors D and B that satisfy the perfect localization 
conditions in In general, any of the four curves at 

the corners of region T in Fig. |T| may collapse onto the 
corresponding corner, whenever the conditions for perfect 
localization of the corresponding operator hold true. 

In particular, if we are interested in the allocation of energy 
within the sets S and T that maximizes, for example, the sum 
of the (relative) energies a 2 + ft 2 falling in the vertex and 
frequency domains, the result is given by the intersection of 
the upper right curve, i.e. (271, with the line a 2 + ft 2 = const. 


Given the symmetry of the curve (261, the result is achieved 
by setting a = ft, which yields 


— 5 ( 1 + &max 


)• 


(28) 


Using the derivations reported in Appendix A, the correspond¬ 
ing function f' may be written in closed form as 


r 


•01 - D^l / 1 + Omax nWT 

V / 2lTT &max ) y max 


(29) 
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where ip 1 is the eigenvector of BDB corresponding to cTj nax . 
More generally, we can find all the vectors whose vertex 
and spectral energy concentrations lie on the border of the 
uncertainty region f and construct the corresponding sets of 
orthonormal vectors by considering the following optimization 
problem 


IV. Sampling 

Given a signal x £ B defined on the vertices of a graph, 
let us denote by Xg £ T> the vector equal to x on the subset 
S C V and zero outside: 

xs := Dec. (35) 


fi = argmax lllB/JIa + (1 - 'y)l|D/ i ||| 

f*- Wfih=i (30) 

s-t. (fi,fj)= 0, j^i, 

where the parameter 7 , with 0 < 7 < 1 , controls the relative 
energy concentration in the vertex and frequency domains. The 
solution of this problem is given by the eigenvectors of the 
matrix 7 B + (1 — 7 )D. In particular, it is interesting to notice, 
as detailed in Appendix B, that the first K eigenvectors of this 
matrix, with K = rank(BD), are related to the eigenvectors 
0, ; associated to the I\ largest eigenvalues of BDB by the 
following relation 


fi =Pi 0 i +9iD0 it 


where 




with (T t := di (BD), and 


(31) 

(32) 

(33) 


OLi — 


N 


1 / 27 (cr 2 - 1) + 1 

2 l y/(l - 27) 2 - 47(7 - l)of 


+ 1 


(34) 


A numerical example is useful to grasp the advantages 
of tolerating some energy spill-over in representing a graph 
signal. The example is built as follows. We consider a random 
geometric graph composed of 100 vertices, where a set of 
nodes is deployed randomly within a finite area and there is 
an edge between two nodes if their Euclidean distance is less 
than a given coverage radius r 0 . To avoid problems with points 
close to the boundary of the deployment region, which would 
have statistics different from the internal nodes, we simulated a 
toroidal surface, so that all points are statistically equivalent in 
terms of graph properties, like degree, clustering, etc. Then, 
we picked a vertex 'i 0 at random and identify the set S as 
the ensemble of nodes falling within a distance R 0 from i 0 . 
Then we let If to increase and, for each value of Rq, we 
evaluate the cardinality of S and we build T as the set of 
indices {1,2,... ,k} enumerating the first k eigenvectors of 
the Laplacian matrix L, where k is the minimum number such 
that the (relative) spill-over energy 1 — a 2 = 1 — <J 2 iax is less 
than a prescribed value e 2 . In Fig. [ 2 ] we plot J 7 = k as a 
function of <S|, for different values of e 2 . The dashed line 
represents the case e 2 = 0: This is the curve of equation 
N = |5| + |.F|. The interesting result is that, as we allow for 
some spill-over energy, we can get a substantial reduction of 
the “bandwidth” IJ 7 ! necessary to contain a signal defined on 
a vertex set S. 


The necessary and sufficient condition for perfect recovery of 
x from xg is stated in the following theorem. 

Theorem 4.1: Given a sampled signal as in (35 1 , it is 


possible to recover x € B from its samples xg, for any x 7 B, 
if and only if 

||BD || 2 < 1, (36) 

i.e. if the matrix BDB does not have any eigenvector that is 
perfectly localized on S and band-limited on T. 


Proof: We prove first that condition (36 1 is sufficient for 


perfect recovery. Let us denote by Q a matrix enabling the 
reconstruction of x from xg as Qxg. If such a matrix exists, 
the corresponding reconstruction error is 

x — Q xg = x — Q(l — D) x = x — Q(l — DB) x, (37) 

where, in the second equality, we exploited the band-limited 
nature of x. This error can be made equal to zero by taking 
Q = (I — DB) . Hence, checking for the existence of Q is 
equivalent to check if (I — DB) is invertible. This happens if 
(36) holds true. Conversely, if 11BD || 2 = 1 and, equivalently, 

11 DB 11 2 = 1, from (19) we know that there exist band-limited 


signals that are perfectly localized over S. This implies that, if 
we sample one of such signals over the set S. we get only zero 
values and then it would be impossible to recover x from those 
samples. This proves that condition ([36| is also necessary. ■ 


Theorem 4.1 suggests also a way to recover the original signal 
from its samples as (I —DB) xg. Alternative recovery 
strategies will be suggested later on. Before considering the 
recovery algorithms, we note that, if x £ B then 


(I — DB x = DBtc. 


(38) 


The operator DB is invertible, for any x £ B, if the 
dimensionality of the image of DB is equal to the rankB, 
i.e. 

rank DB = rankB. (39) 

This condition is then equivalent to the condition of Theorem 


4.1 


In this case the singular vectors of DB corresponding to 
non-zero singular values constitute a basis for B. In general, 
both conditions ( [36] ) and (391 are equivalent to the sampling 
theorem conditions derived, for example, in G9 or 0 - 
The interesting remark here is that formulating the sampling 
conditions as in ( |36[ highlights a strict link between sampling 
theory and uncertainty principle. In fact, if we look at the top- 
left corner of the admissible region in Fig. [T| it is clear that 
if the signal is perfectly band-limited over a subset J 7 , then 
if 2 = 1. To enable signal recovery from a subset of samples 
S, we need to avoid the possibility that a 2 = 0, because this 
would make signal recovery impossible. From Fig.[T| it is clear 
that this is possible only if cr max (BD) < 1, i.e. if (B 6 k holds 
true, as stated in Theorem |4.1| More generally, if we allow for 
some energy spill-over in the frequency domain, so that we 
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|S| 

Fig. 2: Relation between the dimensions of the support over the 
vertex set S and the frequency domain T guaranteeing a spill-over 
energy s 2 . 


take /3 = /3 < 1, to avoid the condition a 2 = 0, we need to 
check that (J 2 nax (BD) < 1 — If 2 ■ Having conditions in this 
form is indeed useful to devise possible sampling strategies, as 
it suggests to take cr max (BD) as a possible objective function 
to be minimized. This topic will be addressed more closely in 


Section VI when dealing specifically with sampling strategies. 

The conceptual link between sampling theory and localiza¬ 
tion properties in the graph and dual domains is also useful to 
derive a signal recovery algorithm that builds on the properties 
of maximally concentrated signals described in Section [II] as 
established in the following. 

Theorem 4.2: If condition < 36 1 of the sampling theorem 
holds true, then any band-limited signal x £ B 
reconstructed from its sampled version x$ 
following formula 


<E 

e 


V 


can be 
by the 


FI 


x = ^2 T2( a: ‘S,V’i}V’i, 


(40) 


where K and {of} with K = \F\, are the 

eigenvectors and eigenvalues of BDB. 

Proof: For band-limited projection of any g, we can write 


K 


Bg = 


(41) 


i-1 


Because of (36 1 , there is no band-limited vector in B perfectly 
localized on S. Hence, all the eigenvectors from ker(BDB) 
belong to B , so that I\ = |F r |. Setting x = B g, since 
BDB^ = <J 2 'ip 1 with Oi f 0 for i £ J 7 , we can then write 


FI 

x = 

i=l 


FI 


, -^bdb 


(42) 


where we have used the property that the operators D and B 

i=l,-,FI 


are self-adjoint and the eigenvectors {t/h} i=1 in are band- 


limited. 

Let us study now the implications of condition ([36]) 
Theorem 4.1 on the sampling strategy. To fulfill (|36|>, 


need to guarantee that there exist no band-limited signals, i.e. 
Bx = x , such that BDx = x. To make (36) hold true, we 


must th en e nsure that BDx f x or, equivalently, recalling 
Lemma 


2.2 DBs x. Since 


Bx = x = DBx + DBx, 


(43) 



Covering radius, ro 

Fig. 3: Percentage of vanishing entries of the Laplacian eigenvectors 
of a RGG vs. coverage radius ro. 


we need to guarantee that DBx / 0. To this purpose, let us 
define the |<S| x \!F\ matrix G as 


G = 


Wii(ii) 

Ui 2 (j l) 


“iit?|S|) 

Ui 2 (j\s\) ■■ 

U i|jr|(j|5|) 


whose f-th column is the eigenvector of index i( of the 
Laplacian matrix (or any orthonormal set of basis vectors), 
sampled at the positions indicated by the indices j i,..., 
Condition ( |36| l is equivalent to require G to be full column 
rank. 

Indeed, the eigenvectors of a graph Laplacian may contain 
several vanishing elements, so that matrix G may easily loose 
rank. As an extreme case, if the graph is not connected, the 
vertices can be labeled so that the Laplacian (adjacency) matrix 
can be written as a block diagonal matrix, with a number of 
blocks equal to the number of connected components. Corre¬ 
spondingly, each eigenvector of L can be expressed as a vector 
having all zero elements, except the entries corresponding to 
the connected component, which that eigenvector is associated 
to. This implies that, if there are no samples over the vertices 
corresponding to the non-null entries of the eigenvectors with 
index included in J 7 , G looses rank. In principle, a signal 
defined over a disconnected graph can still be reconstructed 
from its samples, but only provided that the number of samples 
belonging to each connected component is at least equal to the 
number of eigenvectors with indices in T associated to that 
component. More generally, even if the graph is connected, 
there may easily occur situations where matrix G is not rank- 
deficient, but it is ill-conditioned, depending on graph topology 
and samples’ location. 

A numerical example is useful to grasp the criticality 
associated to sampling. In Fig. [3] we report the percentage of 
vanishing (< l.e — 10) entries of the Laplacian eigenvectors 
of a random geometric graph (RGG), composed of N = 100 
nodes uniformly distributed over a unit square, vs. coverage 
radius ro- The results shown in Fig. [3] are obtained by 
averaging over 100 independent realizations of RGG’s. The 
behavior of the curve can be explained as follows. The value of 
r o that ensures the graph connectivity with high probability is 
approximately ro ~ ->/log (N)/N ss 0.2. This means that, for 
ro < 0.2, there are disconnected components and this explains 














































the high number of zeros. For 0.2 < ro < 0.6 the graph is 
typically composed of a giant component and the number of 
vanishing entries is relatively low. Then, for 0.6 < 7 * 0 < 1.2 
the graph is connected with very high probability, but there 
appear clusters and the eigenvectors of the Laplacian may 
have several entries close to zeros as a way to evidence the 
presence of clusters. Finally, for r*o > 1.2 the graph tends to 
be fully connected and there are no zero entries anymore. We 
can see from Fig. [3] that the percentage of vanishing entries 
can be significant. This implies that the location of samples 
plays a key role in the performance of the reconstruction 
algorithm. For this reason. In Section [VI] we will suggest 
and compare a few alternative sampling strategies satisfying 
different optimization criteria. 

Frame-based reconstruction : The problem of sampling on 
graphs using frames for the space B was initially studied by 
@, ©, where the conditions for the existence of such frames 
were derived. Here we approach the problem using the above 
developed theory of maximally vertex-frequency concentrated 
signals on graph. First we provide some basic definitions of 
the frame theory [34) . 

Definition A set of elements \g 1 }, gX , is a frame for the 
Hilbert space Ft, if for all f £ FL there exist constants 
0 < A < B < oo such that 

Mf\\ 2 2<Y\(f^i)\ 2 <B\\f\\l ( 44 ) 

ie i 


Taking into account ( |39| > and Lemma |2.2| we conclude that 
the condition for a frame-based reconstruction based on a 
canonical-vector frames coincides with the condition of The¬ 
orem |4T] 

In general, however, the reconstruction based on the 
canonical-vector frame may be non robust in the presence of 
observation noise. For this reason, we generalize the sampling 
frame operator Ty by introducing the operator Ty as 

TV/ = BYDB/ = Y, f( u )Vu, (49) 

■u6<S 

where Y is a bounded matrix whose columns y,, without 
loss of generality, can be taken belonging to B, i.e. By, = 
y,, so that the image of Y is also ^-band-limited. Let us 
consider now the reconstruction of f £ B from its samples on 
S, based on Ty - This requires checking under what conditions 
the operator Ty- is bounded and invertible. Since the columns 
of YD corresponding to indices that do not belong to the set 
S are null, we can limit our attention to matrices Y that are 
invariant to the right-side multiplication by D, i.e. YD = Y. 
Finally, we arrive at the following sampling theorem. 

Theorem 4.3: Let T C V* be the set of frequencies and 
S C V be the sampling set of vertices and let Y : Bjr —y C N 
be an arbitrary bounded operator, then {ByJ ieS is a frame 
for Bjr if and only if 

rank BYDB = rankB. (50) 


Definition Given a frame {y,}j gI , the linear operator T : 
FL -£ FL defined as 

T f = Y(f>9i)9i (45) 

iei 

is called the frame operator. 

Constants A and B are called frame bounds, while the largest 
A and the smallest B are called the tightest frame bounds. 
It is useful to note that condition ( [44] ) guarantees the frame 
operator T to be bounded and invertible. 

Now, introducing the canonical basis vector S u , with u £ V, 
i.e. having all zero entries except the w-th entry equal to 1 , we 
investigate under what conditions a set of vectors {B £ u } ug5 
constitutes a frame for B. The frame operator in this case is 

T sf=Y B ^-)B<5 u = Y /(«) B *«- (46) 

nG«S u£<S 

First, we observe that the frame operator Ty, as defined in 
g 6 |, may be also expressed as 

Ty = BDDB = BDB. (47) 

Operator Ty has a spectral norm ||Ty|| 2 equal to cr^, aa . (BD). 
Hence, to guarantee that {Bb„ } |ig5 is a frame, it is sufficient 
to check when Ty is invertible, for any / £ B. The operator 
BD, on its turn, is invertible for any f £ B if its singular 
vectors, not belonging to its kernel, constitute a basis for the 
1 3F | -dimensional space B, or, formally, if and only if 


Proof: The proof follows directly from the invertibility 
conditions for the operator BYDB. ■ 

The tightest frame bounds, according to the Rayleigh-Ritz 
theorem, are defined by the minimum and maximum singular 
values of BYDB 


VminWfWl < Y < <^max\\f\\l, (51) 

u£S 

which is valid for every f £ B. As an example of matrix Y, 
encompassing the approaches proposed in fT9) and [35], we 
have the following frame operator 

T 1 / = BY 1 DB/ = ^/(u)BV (u) , (52) 

uE«S 


where is the indicator function of set A/"(it), defined as 

= 1, if v £ Af(u), and zero otherwise. In this case, 
the graph signal is supposed to be sampled sparsely in such 
a way that around each sampled vertex there is a non-empty 
neighborhood A f(u) of vertices that altogether could cover the 
whole graph. However, this choice is not necessarily the best 
one. In Section VI we will provide numerical results showing 
how the generalized frame-based approach can yield better 
performance results in the presence of observation noise. 


V. Reconstruction from noisy observations 

Let us consider now the reconstruction of band-limited 
signals from noisy samples, where the observation model is 


(48) 


r = D (s + to) , 


rank BDB = rankB. 


(53) 
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where n is a noise vector. Applying ( [40] ) to r, the recon- In such a case, we may resort to an l \-norm minimization, by 
structed signal s is formulating the problem as follows 


m 1 i^i j 

s = ^2^{Ds,'ip i )ip i + ^2^(Dn,'ip i )ip i . (54) 

i—l i= 1 

Exploiting the orthonormality of ■*/),-, the mean square error is 


l-F| 


MSE = E{\\~s-sf 2 }=E\ y £^\ ( Pn,^ i 

\F\ i 


= ^^ D E{nn*}D^, 


(55) 


In case of identically distributed uncorrelated noise, i.e. 
E {nn*} = /pi, using (23 i, we get 


MS£ G = £%tr (Dt/),t/i*D) 


(J, 

2=1 

1^1 z ,2 1^1 i 

= E^ tr «D^)=^E^2- (56) 

1—1 * 2=1 * 

Since the non-null singular values of the Moore-Penrose left 
pseudo-inverse (BD) + are the inverses of singular values of 
BD, i.e. A i ^(BDB) + ^j = A ” 1 (BDB), (56 1 can be rewritten 
as 

MSE G = p 2 n \\(BDB) + \\ F . (57) 


Proceeding exactly in the same way, the mean square error for 

(58) 


the frame-based sampling scheme (491 is 

MSE f = fa || (BYDB) + 


Based on previous formulas, a possible optimal sampling 
strategy consists in selecting the vertices that minimize ( |57j ) 
or (|58]>. This aspect will be analyzed in Section [VT| 


A. l\-norm reconstruction 

Let us consider now a different observation model, where 
a band-limited signal s £ B is observed everywhere, but a 
subset of nodes S is strongly corrupted by noise, i.e. 

r = s + Dn, (59) 

where the noise is arbitrary but bounded, i.e., ||n||i < oo. This 
model was considered in (36) and it is relevant, for example, in 
sensor networks, where a subset of sensors can be damaged 
or highly interfered. The problem in this case is whether it 
is possible to recover the signal s exactly, i.e. irrespective of 
noise. Even though this is not a sampling problem, the solution 
is still related to sampling theory. Clearly, if the signal s is 
band-limited and if the indices of the noisy observations are 
known, the answer is simple: s £ B can be perfectly recovered 
from the noisy-free observations, i.e. by completely discarding 
the noisy observations, if the sampling theorem condition ( [36] ) 
holds true. But of course, the challenging situation occurs 
when the location of the noisy observations is not known. 


s = arg mm | 


r-s ||i. 


(60) 


We will show next under what assumptions it is still possible to 
recover a band-limited signal perfectly, even without knowing 
exactly the position of the corrupted observations. 

To start with, the following lemma, which is known as the 
null-space property (37) , provides a necessary and sufficient 
condition for the convergence of (|60|). 


Lemma 5.1: Given the observation model (59 i, if for any 
s G B, 

IIDalk < ||Ds||!, (61) 


then the £i -reconstruction algorithm (60 1 is able to recover s 
perfectly. 

Proof: To prove this, we show first that for a signal 
consisting of noise only, i.e. r = Dn, the best band-limited 
/:-i - norm approximation g to this signal is the zero vector. In 
fact. 


IIDn - ff||i = ||D (n - g) ||i + ||Dg||i 

> IIDnlli-IIDfllli + UDgll! 

> IIDnUi. (62) 

Now, suppose instead that s p 0. We can observe that 

Ik - fllli = Ik + Dn - gr||! = ||Dn + (s - g) || 1} (63) 


i.e. the best band-limited approximation g to s is s. Since 
we proved before that, under ( | 6 l~| , the best band-limited 
approximation of Dn is the null vector, (631 is minimized 
by the vector s = s. ■ 

From the previous lemma it is hard to say if, for a given S 
and T, condition ( |6T[ ) holds or not. Next lemma provides such 
a condition. 

Lemma 5.2: Given the observation model (f59[). if 


max V (DB) 
je-F 


< min(DB) 
fe-F ^ I v 


(64) 


then the t\ -reconstruction method (601 recovers any signal 
s £ B perfectly, i.e. s = s. 

Proof: Since 


and 


sup ||DB ff || i = max V (DB), 
gee ^ , 1 

l!slli=i 


inf 1 1 DBtjf 11 1 = min V (DB) 

gets ’ 

l!g||i=i 


if ( |64| ) holds true, then 


l|DBg ||i ^ f 

sup ——- < ml 

gee llslli 9^ b 


DBg 

llfllli 


(65) 


( 66 ) 


(67) 


As a consequence, for every s £ B, ( [64] ) implies ( [61] ) and 
then, by Lemma HU it guarantees perfect recovery. ■ 

Besides establishing perfect recovery conditions. Lemma |5.2| 
provides hints on how to select the vertices to be discarded 
still enabling perfect reconstruction of a band-limited signal 
through the L\ -norm reconstruction. 
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Number of noisy samples, |«S| 

Fig. 4: Behavior of Mean Squared Error versus number of noisy 
samples, for different signal bandwidths. 


An example of ^ reconstruction based on (60 1 is useful 
to grasp some interesting features. We consider a graph 
composed of 100 nodes connected by a scale-free topology 
[ 32) . The signal is assumed to be band-limited, with a spectral 
content limited to the first IJ 7 ! eigenvectors of the Laplacian 
matrix. In Fig. [4] we report the behavior of the MSE associated 
to the ( \ -norm estimate in <|60f, versus the number of noisy 
samples, considering different values of bandwidth |^7 r |. As 
we can notice from Fig. [4] for any value of \3F\, there exists 
a threshold value such that, if the number of noisy samples 
is lower than the threshold, the reconstruction of the signal 
is error free. As expected, a smaller signal bandwidth allows 
perfect reconstruction with a larger number of noisy samples. 

We provide next some theoretical bounds on the cardinality 
of S and T enabling t \ - norm recovery. To this purpose, we 
start proving the following lemma. 

Lemma 5.3: It holds true that 



H D /I!l , 2,0, m 

sup < g |S| IT 7 ! , 

/es IIJ IIi 

(68) 

where g is defined as 



g := max . 

iGV 

(69) 

Proof: Let 

us consider the expansion formula for f £ B 

/( fc ) = 


Uj(k)u*{i) 


iGV iGV j’G J 7 

(70) 

which yields 



l/(fc)l < ll/lloo < V|/«|5> 2 = /z 2 |.F|| 

\f\\i, (71) 


iGV j EF 


or 

11 n 11 s. 11 if 11 OO 

11/1,1 -m 2 |jt 

(72) 

By combining 

||D/|[i< ||/|U|5| 

(73) 

with ([72}, we come to 



II Jilt 

(74) 


Theorem 5.4 (l\ -uncertainty): Let /, ||/||i = 1, be a signal 
a\ -concentrated to the set of vertices S, i.e. l|D/||i > a±, and 
^-band-limited to the set of frequencies T. i.e. ||B/| It > jSi, 
then 


\S\\?\> 


(cti +/3i - 1) 
M 2 (2-/3i) ' 


(75) 


Proof: If HB/Iji > B-\. then by definition there exists a 
g £ B such that ||g — /||i < 1 — and, for this g, we can 
write 


IIDffHr > \\Bfh - IID (fl - /) ||i > HD/IU - 1 + /3r (76) 
and 

IMIi < li/lli + 1 - A- (77) 


Therefore 

IlDflU 


\9\\i 


> 


D/lli -1 + Pi > on + 0i - 1 
||/||i + l — Pi 2 — /?! 


(78) 


Combining this result with the results of Lemma |5.3| we finally 
get { 73 }. ■ 

It is worth noting that an ^-uncertainty principle analogous to 
Theorem 5.4 may also be easily derived. Finally, we provide 
the condition for perfect reconstruction using ( |60} when S is 
not known. 

Theorem 5.5: Defining 


g := max \uj{i )\, 
iev 

if, for some unknown S , we have 


l-5| < 


1 


(79) 


(80) 


2^ 2 vr 

then the £i-norm reconstruction method ([60} recovers s £ B 
perfectly, i.e. s = s. for any arbitrary noise n present on at 
most |iS| vertices. 

Proof: For a band-limited signal s £ B satisfying © 


we can also write 


IDs! 


1 

< 2 


(81) 


On the other hand, from Lemma 5.3 we know that the supre- 
mum of the previous ratio among all s £ B is upper bounded 
by /i 2 |<S| IJ 7 !. Hence, by Lemma 5.1 all band-limited signals 
satisfying ( |80) satisfy also condition (811 or, equivalently <[6T}. 
for perfect L| -norm recovery. ■ 


VI. Sampling strategies 

When sampling graph signals, besides choosing the right 
number of samples, whenever possible it is also fundamental 
to have a strategy indicating where to sample, as the samples’ 
location plays a key role in the performance of reconstruction 
algorithms. Building on the analysis of signal reconstruction 
algorithms in the presence of noise carried out in Section |V| 
a possible strategy is to select the samples’ location in order 
to minimize the MSE. From ([57}, taking into account that 

A, (BDB) = af (BD) = of (SU’D), (82) 

the problem is equivalent to selecting the right columns 
of the matrix SU* in order to minimize the Frobenius 
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norm of the pseudo-inverse (SU*D) . This problem is 
combinatorial and NP-hard. The problem of selecting the 
columns from a matrix so as to minimize the Frobenius norm 
of its pseudo-inverse was specifically studied for example 


in 1261, so that we can take advantage of those methods for 
our purposes. In the sequel, we provide a few alternative 
strategies for selecting the samples’ locations. 


1) Greedy Selection - Minimization of Frobenius norm of 
(EU*D) + ; This strategy aims at minimizing the MSE in 
(56 1 . The method selects the columns of the matrix SU* 


so that the Frobenius norm of the pseudo-inverse of the 
resulting matrix is minimized. In case of uncorrelated noise, 
this is equivalent to minimizing X)i=i V 17 !- We propose a 
greedy approach to tackle this selection problem. The resulting 
sampling strategy is summarized in Algorithm 1. Note that 
S is the sampling set, indicating which columns to select, U 
denotes the matrix composed by the rows of U* corresponding 
to J 2 3 * * * 7 , and the symbol U _4 denotes the matrix formed with the 
columns of U belonging to set A. 


Algorithm 1 : Greedy selection based on minimum Frobe¬ 
nius norm of (SU*D) + 


Input Data : U, rows of U* corresponding to J 7 ; 

M, the number of samples. 

Output Data : 5, the sampling set. 


Function : 


initialize 5 = 0 

while |5| < M, set K = min(|5|, l-Fl) 

K 1 
s = arg min ^ 


j i= i CT i( u suO})' 


S <— S U {s}; 

end 


2) Maximization of the Frobenius norm of SU’D: The 
second strategy aims at selecting the columns of the matrix 
U in order to maximize its Frobenius norm. Even if this 
strategy is not directly related to the optimization of the MSE 
in (56) , it leads to a very easy implementation that shows 
good performance in practice, as we will see in the sequel. In 
particular, since we have 

max ||UDj|! = max y'||(U) i |||, (83) 

the optimal selection strategy simply consists in selecting the 
M columns from U with largest f^-norm. 

3) Greedy Selection - Maximization of the volume of the 
parallelepiped formed with the columns of U: In this case, the 
strategy aims at selecting the set S of columns of the matrix 
U that maximize the (squared) volume of the parallelepiped 

built with the selected columns of U in 5. This volume can 

~ + ~ 

be computed as the determinant of the matrix U 5 U 5 , i.e. 

IU 5 U 5 I = n£j^(UsUs), where A^UgUs) denote the 
eigenvalues of UjTJs, as far as |5| < |.F|. If |5| exceeds 
\F\, we take the product of the largest IJ 7 ! eigenvalues. 
The rationale underlying this approach is not only to choose 
the columns with largest norm, but also the vectors more 
orthogonal to each other. Also in this case, we propose a 


greedy approach, as described in Algorithm 2. The algorithm 
is similar, in principle, to the so called DETMAX algorithm 
mentioned in (25) , but is much simpler to implement because 
DETMAX, at each iteration, adds and deletes points until a 
convergence criterion is satisfied. Our algorithm, instead, starts 
including the column with the largest norm in U, and then it 
adds, iteratively, the column that gives the new highest value 
of U iS U,s|. The number of steps is then fixed and equal to the 
number of samples. Nevertheless, it looks suitable for graph 
signals because it exhibits very good performance, as shown 
later on. 

Algorithm 2 : Greedy selection based on maximum paral¬ 
lelepiped volume 

Input Data : U, rows of U* corresponding to J 7 ; 

M, the number of samples. 

Output Data : 5, the sampling set. 

Function : initialize 5 = 0 

while |5| < M, set K = min(|5|, IT 7 !) 

K 

s = arg max TT A^UjUs); 

0 

5<-5U {s}; 

end 


Comparison of sampling strategies: We compare now the 
performance obtained with the proposed sampling strategies, 
with random sampling and with two strategies proposed in the 
literature: 1 ) the method proposed in (7), aimed at maximizing 
the minimum singular value of SU’D: and 2) the approach 
proposed in (T 8 ), searching for the smallest sampling set 
enabling the unique recovery of a band-limited signal. We test 
the results using a scale-free (SF) random graph modeQ as 
this model encompasses many real world networks, see, e.g., 
1321. Fig. [5] reports the normalized MSE (NMSE), defined as 
the mean square error per node, divided by the noise variance, 
under two configurations: (a) N = 30, IJ 7 ! = 5; and (b) 
N = 200, IJ 7 ! = 10. In the case N = 30, we report also 
the benchmark obtained with the exhaustive search, whereas 
for N = 200 this choice is computationally too expensive. The 
additive noise in (53) is assumed to be an uncorrelated, zero 
mean Gaussian random vector with unit variance. The results 
shown in the figures have been obtained by averaging over 100 
independent realizations of graph topologies. We compare six 
different sampling strategies, namely: (i) the random strategy, 
which picks nodes randomly; (ii) the greedy selection method 
of Algorithm 1, minimizing the Frobenius norm of (XU*D) + 
(MinPinv); (iii) the Max Frobenius norm (MaxFro) strategy; 
(iv) the greedy selection method of Algorithm 2, maximizing 
the volume of the parallelepiped formed with the columns 
of SU’D (MaxVol); (v) the greedy algorithm (MaxSigMin) 
maximizing the minimum singular value of SU’D, recently 
proposed in |7); and (vi) the greedy algorithm searching for the 
smallest sampling set enabling unique recovery (MinUniSet), 
proposed in CD- It is worth to point out that the applica¬ 
bility of MinUniSet is limited to the case where the graph 

*We also tested all methods on random geometric graphs and the results 
were qualitatively similar. 
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Fig. 5: Normalized Mean Squared Error vs. number of samples for different sampling strategies and scale-free topology: 
(a) N = 30, l-FI = 5; (b) N = 200, | F\ = 10. 


signal is lowpass, i.e. its GFT has a support limited on the 
lowest indices. Hence, for the sake of making the comparison 
possible, we considered a lowpass signal. However, MaxVol, 
MinPinv and MaxSigMin are applicable to signals whose 
frequency support T is any subset of V*. Furthermore, in 
the implementation of MinUniSet it is necessary to specify an 
external parameter, namely the order k of the cut-off frequency 
(please, see m for details), which affects the performance of 
the method. In our test, we chose a value k = 10, as this value 
seemed to provide the best performance in the average. 

From Fig. [5] we observe that, as expected, the normalized 
mean squared error decreases as the number of samples 
increases. As a general remark, we can notice how random 
sampling performs quite poorly. This shows that, when sam¬ 
pling a graph signal, what matters is not only the number of 
samples, but also (and most important) where the samples are 
taken. Furthermore, we can notice how the proposed MaxVol 
and MinPinv strategies outperform all other strategies and 
approach very closely the optimal benchmark. The recently 
proposed MaxSigMin approach performs very close to the 
proposed MaxVol and MinPinv strategies when the number 
of samples is equal to its minimum value, i.e. |<S| = jJ 7 !, 
but MaxVol and MinPinv outperform MaxSigMin, when the 
number of samples assume intermediate values between IJ 7 ! 
and N. Furthermore, in such a case, comparing Figs. 0(a) 
and (b), we can see how the gain increases as the number of 
nodes increases. 

As an example of sampling set, in Fig. [6] we report an 
application to a real network: the IEEE 118 Bus Test Case, 
representing a portion of the American Electric Power System 
(in the Midwestern US) as of December 1962. This test graph 
is composed of 118 nodes. As illustrated in f38) , the dynamics 
of the power generators give rise to smooth graph signals, 
so that the band-limited assumption is justified, although in 
approximate sense. In our example, we consider a lowpass 
signal with IT 7 ! = 6 and we take a number of samples equal 
to 6. In Fig. [6] we report the network structure, where the 
color of each node encodes the entries of the eigenvector of 
L associated to the second smallest eigenvalue (these entries 
highlight clusters in the network, as shown in |22j). The 
green squares correspond to the samples selected using either 
MaxVol or MinPinv strategy, which provide the same result in 


this case. It is interesting to notice how each method assigns 
two samples per cluster and puts the samples, within each 
cluster, quite far apart from each other. This is just an example, 
but it suggests an interesting conceptual link with graph 
independent sets, which is worth of further investigations. 

Finally, we illustrate how to improve robustness to noise 
by using the frame-based reconstruction method. In ( [52j i, we 
provided a possible choice of frame operator to be used for 
sampling. In the following, we show how the mean square 
error MSEp in (|58)i behaves for different choices of graph 


covering sets J\F(v) used in (52 1 . For this example, we consider 
a (thorns) random geometric graph having 100 nodes with 
connectivity radius ro = 0.1883. We consider two sampling 
strategies, namely: (i) the random strategy; (ii) the MaxVol 
strategy illustrated in Algorithm 2. Around each sample, taken 
at vertex v, the local set Af(v) is composed of the nodes 
falling inside a ball of radius r\ centered on v. The local sets 
associated to each sample can intersect each other and their 
union does not necessarily cover the whole graph. In Fig. [7] 
we show the normalized MSE as a function of n normalized 
to ro- We can see from Fig.[7]that there exists an optimal size 
of covering local-sets which minimizes the mean square error. 
An intuitive explanation of the behavior shown in Fig. [7]is that, 
for small values of rq, as rq increases, the local sets around 
each sample help reducing the MSE. However, as rq exceeds 
a certain threshold, the covering sets significantly overlap with 
each other, giving rise to a frame with more dependent vectors, 
in which case the MSE starts increasing again. Furthermore, 
we can see how, increasing the number of samples, for a 
given bandwidth, the normalized MSE decreases. Finally, we 
can notice how the MaxVol strategy outperforms the random 
sampling, especially for low number of samples. 


VII. Conclusion 

In this paper we have presented a framework for the 
analysis of graph signals that, starting from the localization 
properties over the graph and its dual domain, yields an 
uncertainty principle and establishes a useful conceptual link 
between uncertainty principle and sampling. The approach 
is applicable to any unitary transformation from a discrete 
domain to the transformed one. Besides its conceptual interest, 
the relation between uncertainty principle and sampling theory 
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Fig. 7: Normalized mean square error vs. the ratio n/ro. 


provides suggestions on how to identify sampling strategies 
and recovery algorithms robust against additive observation 
noise. Interesting further developments include the extension 
to hypergraphs, the robustness analysis in the case of non 
perfectly band-limited signals and the identification of further 
robust recovery algorithms, including the design of optimal 
frame bases. 

Appendix A 
Proof of TheoremI3.1I 


proving Theorem 3.1 The proof basically follows the same 
procedure of where it was initially stated for continuous¬ 
time signals. 

Using the usual definition of the scalar product (a, b) = 
a*b, we can define the angle between two vectors 0(a. b) as 

3?(a, b) 


9(a , b) = cos 


.NHI&lh 

By Schwartz inequality (a, 6) < ||a|| 2 1|b ||2 
|3?(a,6)| < |(a, 6)| it is clear that 

5f(a, 6) 

-1 < — x ' < 1 

“ I|a|| 2 ||6|| 2 - 

and 9(a, 6) = 0 only if 6 = const-a, i.e. when two vectors are 
colinear. Now, let us consider two vectors f £ B and g £ T). 
For the beginning let us consider a fixed function f £ B and 
an arbitrary g £ V. In this case the following lemma gives us 
an achievable lower bound of 9 (f,g). 

Lemma A.l: For a given vector / there exists 

IID/II2 


inf 9(f,g) = cos' , 

a^ D II/II2 

which is achieved by g = fcD/ for any k > 0. 
Proof: For any g £ V it holds 


and 


K(/,S><|</,s)| = |(D/,g) 
KD/, ff )|<||D/|| 2 -||s|| 2 . 


So we can write 

m-a) 


l/ll: 


< 


119112 


ID/ll: 

II/II2 


t(/.D/> 
|D/ll2 -11/112 


Taking into account that cos 6 decreases monotonically in 
[0, 7 r], it follows that for any g £ V 

8(f,g) 

with equality when g and D/ are proportional. ■ 

If the quantity 


On 


gdV 


( 86 ) 


Before proceeding to the proof, we introduce some useful 
notation and provide several results that will be used for 


is assumed by some specific f £ B and g £ V then we 
will say that B and V form the minimum angle 0 rrnni which 
is called the first principal angle J39j, and is given by the 
following theorem. 

Theorem A.2: The minimum angle between B and V 
exists and equals to 


9m.in. — COS 


(BD), 


(87) 


(84) 

and the fact that 


and is achieved by / = tp 1 and g = Dt /) l5 where 
is an eigenvector of BDB corresponding to the eigenvalue 
C (BD). 

we can write 


Proof: Using the result of Lemma A.l 


= inf cos 
feB 


-1 \ (f 1 D/) I 

II/II2 ^ 


. r a / r \ • r -1 IID/II2 

ml 0(f.g) = inf cos ... — 

/es f&B ||/||2 

g£T> 

where infimum on the left side is achieved if the infimum on 
the right side is achieved. Since cos 9 decreases monotonically 


(85) 


in [0, 7 r], we can apply the result of Theorem 2.3 from which 
it follows that infumum is achieved by the eigenvector • 0 1 of 
BDB corresponding to the maximum eigenvalue (BD). 
Therefore we conclude that 

.-1 l</,D/)| 


9 


mm — itlf COS 
feB 


\fb 


= cos 1 <7 „ 


(BD) 


Notice that, under perfect localization conditions, i.e. Theorem 


2.1 o' max (BD) = 1 and the minimum angle is 0, thus im¬ 
plying that there are some vectors which lie in both subspaces 
B and V. Next, we derive, without loss of generality, which 
values of j3 are attainable for every choice of a, assuming unit 
norm vectors /. 

The case a = 1 means that all the energy of signal is 


supported only on <S. According to (21 1 and Lemma 2.2 the 
minimally concentrated on T vector from V is the eigenvector 
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of DBD, corresponding to the eigenvalue cr^ lQa ,(DB), while 
the maximally concentrated on T vector from V is the eigen¬ 
vector of DBD, corresponding to the eigenvalue <T^ ax (DB). 
Therefore 

(DB) ( 88 ) 


inf fj 2 = 1 — a 
f&T> 

\\fh=i 


2 

max 


and 


sup p 2 = a: 
feT) 

11 / 112=1 


2 

max 


(DB) 


cos 


-1 


a + COS j3 > COS 1 (7 max (BD). 


We can decompose any vector / as 

/ = ad/ + 7 b/ + 9 , 


— 23?(D/, B/) = —a 2 + 1- 


i(D/,B/)r 


a 2 P 2 

l(D/,B /)| 2 

a 2 /? 2 


Mi 


i - 


According to (84 1 we define 

cos 9 = 5R t 


<D/,B/) 


(93) 


(94) 


o<i-MMT <1 _ ra 2, 

a 2 p 2 


where equality can be achieved if and only if g = 0 and 


(D/,B/) is real. Next, from (98 1 we can write 

P < cos ( 9 — cos -1 a), (99) 

from which it follows, using bound (|95), that 


P < COS (cos 1 (Jmax (BD) — COS 1 Of) 


(89) 


( 100 ) 

is 


and we immediately come to ( |90| . Equality in (lOOl i 
achieved by 

f = P ip 1 +qDip 1 , ( 101 ) 


for the case a = 1. All the values in between are attain¬ 
able by the function / = 1 with X)[=i a i = 1 ’ 

where {i{’i}i=i..K are the eigenvectors of DBD belonging 
to V and corresponding to the eigenvalues from the interval 
[l-a i mu (DB),C(DB)]. 

Next let us consider the behavior of P for a belonging to 
a £ (0, 1). First, we will show that 


with 


1 — a 2 


P = 


1 - u 2 

max 


(BD)’ 


q = 


1 — a 2 


(BD) 


1 — (T^ 

^ max 


(BD) 


( 102 ) 


(103) 


(90) 


(91) 


where g is a vector orthogonal to both B and V and again we 
consider a unit norm / with ||D/1 | 2 = a. Our goal is to find 
the nearest vector to f in the space spanned by D/ and B/. 

First, we calculate the inner products of © successively 
with /,D/,B/ and g and arrive to the system of equations 

1 = Aa 2 + 7^ 2 + (g, f), 

a 2 = Act 2 + 7 (B/, D/), 

/? 2 =A(D/,B/)+ 7 /? 2 , ^ Z) 

(f,g ) =(g,g)- 

After eliminating (g,f), A and 7 from the above system we 
arrive to 


and where is an eigenvector of BDB corresponding to the 
eigenvalue (BD). In (102» and (103i it was supposed 
that er 2 lax (BD) < 1, because in the case cr^j ax (BD) = 1 
there exists at least one vector belonging to both B and V, 
therefore point with a = 0 and /3 = 1 belongs to I . 

To demonstrate that f' stays on the boundary of the 
uncertainty region T, we first rewrite ( | 100 [ ) as 

P < a a max (BD) + ^(1 - « 2 )(1 - <J 2 max (BD)). (104) 

Vertex and frequency energy concentrations for j' are given 
by 


«/' = l|D /'|| 2 = (p + q)(Tmax (BD), 
(3 f/ = \\Bf\\ 2 =p + qa 2 max (BD) 


(105) 

(106) 


|D/|| 2 ||B/|| 2 - 

Because we measure the angle 9 between D / £ V and B/ £ 
B, according to Theorem |A.2[ 

9 > cos ^ 1 (Jmax (BD). (95) 

Due to the fact that 

ap cos 9 = K(D/, Bf) < \ (D/, Bf) \ < a p, (96) 
we can write 


(97) 


In (93 i, after introduction of 9, completion of the square on 
the left-hand side and use of ( [97) , we finally arrive to 

(P — a cos 9 ) 2 < (l — a 2 ) sin 2 9, 


Substituting ap and f3p in (104 1 we immediately obtain 
equality. 

Applying the same steps between (90i and (lOOi to 
the operators BD, BD and BD, we obtain the three 
remaining inequalities in ( [25) . For j3 = 1 and a £ 
[1 - <Jm ax (BD) , (Jmax (BD)] the concentrations are achiev¬ 
able by the eigenvectors of BDB which belong to B and their 
linear combinations. Continuing by analogy one can show that 
all the values a and P belonging to the border of T (see Fig. 
[T) are achievable. All the points inside T are achievable by 
the functions build up from different combinations of left and 
right singular vectors of BD, BD, BD and BD. 

Appendix B 

Maximally concentrated dictionary for 

DIFFERENT CONCENTRATIONS IN VERTEX AND FREQUENCY 
Fet us consider the following optimization problem 

fi = argmax 7 ||B/J^ + (1 - qOHD/Jl! 

ft- II/i II2=1 (107) 

s-t. (fi,fj)= 0, j^i, 

with parameter 0 < 7 < 1 controlling the relative energy 
concentration in vertex and frequency domains. The solution 


of (107 1 is given by the eigenvectors of the self-adjoint 


(98) 


operator 


( 7 B + (1 - 7)D) f t = Ui U 


(108) 
























15 



3 2 = i 

3 2 = i 
^2 _ , 


Fig. 8 : Position of the first three maximally concentrated vectors in 
the region F for 7 = 0.75. 


according to the Rayleigh-Ritz theorem. Each value of 7 
corresponds to one point on the curve (27 1 in a way that 
the vector /, maximizing ( 107| > has energy concentrations 
(a/, /3f) lying on the curve (27 1 . Hence, the solution of (107 1 
is achieved at the tangent point of the curve (27 1 with the line 

(1 — 7 )a 2 + 7/3 2 = uj\. (109) 


Solving the above geometric problem we obtain the pair a, (3 
given by 


a f 1 = 


2 7 (<tL -1) + 1 


\ 2 V V( l - 2 7) 2 - 47(7 - l)Cm, 


+ 1 


fifi — Gmax H"~ 
where <7 max •= cr m.a: 


1 _ a /i )( 1 ~ °i 


2 

max 


( 110 ) 


(ill) 


(BD). The eigenvalue ay is provided 
by (109 1 , i.e. ay = (1 — 7 )a 2 fi + 7 / 3 ^. The first vector of 
the solution of (|107|>, /,, may be expressed in terms of "0 i 


simply by putting otf 1 into ( 101 1 , ( p~02] > and ( fl03| ). 

Moreover, the first K := rankBD orthogonal vectors {/,,} 
giving the solution of (| 107[> may be constructed b y sub sti tuting 


vanous at 


(BD) instead of a‘^ nax ( BD) into (102 1 , (103», 


(JTTO]) and then into ( | 101 1 >. To demonstrate this we consider 
vectors /, of the form 


with 


and where 


fi =Piip i + q l T)'iJ> i , 



( 112 ) 


(113) 


(114) 


\ 


1 / 2 7 (of - 1) + 1 

2 l y/(l - 27)2 - 47(7 - l)af 


+ 1 


(115) 


For the sake of shortness we used ay := a, (BD) above. 
First, it is easy to see that for different i,j = 1,... ,K the 


vectors given by ( |1 1 2| > are mutually orthogonal. Secondly we 
want to demonstrate that vectors of the form ( fTT2l i are the 
eigenfunctions of ( 7 B + (1 — 7 )D). We show this by direct 
substitution, i.e. we have 

( 7 B + (1 - 7 )D)/ i = (7 Pi + 7 o^q^i (116) 

+ (1 ~j)(Pi J rq i )T)tp i . 


Thus f i is an eigenvector of ( 7 B + (1 — 7 )D) if and only if 
the following equality holds true 

7 + 7of— = (1-7) fl- + —) • (H7) 

Vi V Qi J 

Substituting p, and q, from ( | 11 3[ > and ( | 1 14[ > we easily demon¬ 
strate that equality holds. Eigenvalues ay are given by 


= (1 - 7 ) 1 + 


Qi 


(118) 


In Fig. [ 8 ] we provide an illustration showing the vertex and 
frequency energy concentrations of the first three /, for the 
case of 7 = 0.75, a\ = 0.85 ,o\ = 0.7 and erf = 0.55. The 
corresponding eigenvalues of ( |108| > in this case were found to 
be ay = 0.971036, uy = 0.94017 and w 3 = 0.906971. 

Using expression (112 1 we have found the first K vectors 
maximizing (107 1 . The remaining N — K vectors can be 
expressed in a similar way through the maximally concentrated 
eigenvectors of the operators BDB, BDB and BDB. 
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